Scalable Hypergraph Processing

نویسندگان

Jin Huang

Rui Zhang

Jeffrey Xu Yu

چکیده

A hypergraph allows a hyperedge to connect more than two vertices, using which to capture the high-order relationships, many hypergraph learning algorithms are shown highly effective in various applications. When learning large hypergraphs, converting them to graphs to employ the distributed graph frameworks is a common approach, yet it results in major efficiency drawbacks including an inflated problem size, the excessive replicas, and the unbalanced workloads. To avoid such drawbacks, we take a different approach and propose HyperX, which is a thin layer built upon Spark. To preserve the problem size, HyperX directly operates on a distributed hypergraph. To reduce the replicas, HyperX replicates the vertices but not the hyperedges. To balance the workloads, we investigate the hypergraph partitioning problem aiming at minimizing the space and the communication cost subject to two separate constraints on the hyperedge and the vertex workloads. With experiments on both real and synthetic datasets, we verify that HyperX significantly improves the efficiency of the learning algorithms when compared with the graph conversion approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enabling Scalable Social Group Analytics via Hypergraph Analysis Systems

With the rapid growth of large online social networks, the ability to analyze large-scale social structure and behavior has become critically important, and this has led to the development of several scalable graph processing systems. In reality, social interaction takes place not just between pairs of individuals as in the common graph model, but rather in the context of multi-user groups. Res...

متن کامل

Social Hash Partitioner: A Scalable Distributed Hypergraph Partitioner

We design and implement a distributed algorithm for balanced k-way hypergraph partitioning that minimizes fanout, a fundamental hypergraph quantity also known as the communication volume and (k − 1)-cut metric, by optimizing a novel objective called probabilistic fanout. This choice allows a simple local search heuristic to achieve comparable solution quality to the best existing hypergraph par...

متن کامل

HYDRA: HYpergraph-Based Distributed Response-Time Analyzer

It is important for almost all transaction processing and computer-communication systems to satisfy response time quantile targets. This paper describes HYDRA, a scalable parallel tool for the analytical determination of response time densities in large, structurally-unrestricted Markov models derived from high-level specifications. The tool exploits an efficient distributed uniformization-base...

متن کامل

Hypergraph Partitioning for Faster Parallel PageRank Computation

The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a user-centred model of web-surfing behaviour. As the web has expanded and as demand for user-tailored web page ordering metrics has grown, scalable parallel computation ...

متن کامل

Similarity analysis with advanced relationships on big data

Similarity analytic techniques such as distance based joins and regularized learningmodels are critical tools employed in numerous data mining and machine learning tasks. We focus on two typical techniques in the context of large scale data and distributed clusters. Advanced distance metrics such as the Earth Mover’s Distance (EMD) are usually employed to capture the similarity between data dim...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Scalable Hypergraph Processing

نویسندگان

چکیده

منابع مشابه

Enabling Scalable Social Group Analytics via Hypergraph Analysis Systems

Social Hash Partitioner: A Scalable Distributed Hypergraph Partitioner

HYDRA: HYpergraph-Based Distributed Response-Time Analyzer

Hypergraph Partitioning for Faster Parallel PageRank Computation

Similarity analysis with advanced relationships on big data

عنوان ژورنال:

اشتراک گذاری